GBIF Data Processing and Validation

نویسندگان

چکیده

GBIF (Global Biodiversity Information Facility) is the largest data aggregator of biological occurrences in world. was officially established 2001 and has since aggregated 1.8 billion occurrence records from almost 2000 publishers. relies heavily on Darwin Core (DwC) for organising it receives. Data Processing Pipelines Every single record that gets published to goes through a series three processing steps until becomes available GBIF.org. source downloading parsing into verbatim interpreting values Once all are standard form, they go set interpretations. In 2018, underwent significant rewrite order improve speed maintainablility. One main goals this consistency between GBIF's Living Atlases. connection with this, current validator fell out sync pipelines processing. New Validator The service allows anyone GBIF-relevant dataset receive report syntactical correctness validity content contained within dataset. By submitting validator, users can validation interpretation procedures usually associated publishing quickly determine potential issues data, without having publish it. planning rework because does not exactly match Planned Changes new will project. Validations be saved show up user pages similar way downloads derived datasets appear now (no more bookmarking validations!) A downloadable found produced. Suggested Changes/Ideas guiding philosophies interface avoiding information overload. often quite verbose its feedback, highlighting may or fixable particularly important. will: generate map geolocations; give by importance; "What", "Where", "When" flags priority; some possible solutions suggested fixes flagged records. We see hosted portal environment as implement pre-publication interactive visual. Potential Quality Flags team been compiling list quality flags. Not easy implement, so cannot promise get implemented, even if great idea. advantage any flag step validator. Easy flags: country centroid : Country/province centroids known problem. zero coordinate Sometimes publishers leave either latitude longitude field when should have left blank NULL. default uncertainty meters flag: value code used dwc:coordinateUncertaintyInMeters, which might indicate incorrect. This especially case 301, 3036, 999, 9999. no higher taxonomy Often record. cause problems matching backbone taxonomy.. null There discussion encourage fill dwc:coordinateUncertaintyInMeters. every record, ones taken Global Positioning System (GPS) reading, an dwc:coordinateUncertaintyInMeters It also nice escape hatch, such publisher rid false positives remove filling value. Batch-type validations doable pipelines, but probably include: outlier: Outliers generally two types outliers: environmental outliers distance outliers. Currently type outlier. sensitive species: species would where considered vulnerable way. Usually due poaching threat only one area. gridded dataset: Rasterized common GBIF. These location pinned low-resolution grid. already experimental API (Application Programming Interface). Conclusion moving targets. Variable always issue aggregating large amounts data. With architecture, we hope features added easily. Time staffing resources short supply, plan prioritise feedback publishers, them work correcting most important issues. projects like vocabulary server, community participation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data collection, processing, validation, and verification.

The collection, processing, validation, verification, formatting, filing, and storage of the required input data are some of the most important components in the National Institute for Occupational Safety and Health (NIOSH) Radiation Dose Reconstruction Program. Without question, the quality and scientific validity of the reconstructed dose estimates are totally dependent on these aspects of th...

متن کامل

Analysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases

Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...

متن کامل

Processing and Validation of Intermediate Energy Evaluated Data Files

Pursuant to Article 1 of the Convention signed in Paris on 14th December 1960, and which came into force on 30th September 1961, the Organisation for Economic Cooperation and Development (OECD) shall promote policies designed: − to achieve the highest sustainable economic growth and employment and a rising standard of living in Member countries, while maintaining financial stability, and thus t...

متن کامل

Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)

Museum of Vertebrate Zoology University of California Berkeley, CA USA Global Biodiversity Information Facility, GBIF Secretariat, Copenhagen, Denmark California Academy of Sciences, San Francisco, USA The University of California at Berkeley, Berkeley Natural History Museums, Berkeley, California, USA Botanic Garden & Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Berlin, Germany GB...

متن کامل

Designing, validation, and reliability assessment of software to acquire kinematics parameters of motion by image processing

Motion analysis systems are useful and effective equipment in biomechanics research. Unfortunately these systems are available for few researchers because these are expensive equipment. The aim of this study was to design and validation of a practical and inexpensive software, to determine the exact markers position in space and compute the kinematic of movement. In designing the software, the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Biodiversity Information Science and Standards

سال: 2021

ISSN: ['2535-0897']

DOI: https://doi.org/10.3897/biss.5.75686